AITopics | dataset condensation

Collaborating Authors

dataset condensation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

Neural Information Processing SystemsJun-16-2026, 14:30:03 GMT

Dataset condensation aims to synthesize compact yet informative datasets that1 retain the training efficacy of full-scale data, offering substantial gains in efficiency.2 Recent studies reveal that the condensation process can be vulnerable to backdoor3 attacks, where malicious triggers are injected into the condensation dataset, manipu-4 lating model behavior during inference. While prior approaches have made progress5 in balancing attack success rate and clean test accuracy, they often fall short in6 preserving stealthiness, especially in concealing the visual artifacts of condensed7 data or the perturbations introduced during inference. To address this challenge,8 we introduce SNEAKDOOR, which enhances stealthiness without compromising9 attack effectiveness. SNEAKDOOR exploits the inherent vulnerability of class deci-10 sion boundaries and incorporates a generative module that constructs input-aware11 triggers aligned with local feature geometry, thereby minimizing detectability. This12 joint design enables the attack to remain imperceptible to both human inspection13 and statistical detection. Extensive experiments across multiple datasets demon-14 strate that SNEAKDOOR achieves a compelling balance among attack success rate,15 clean test accuracy, and stealthiness, substantially improving the invisibility of both16 the synthetic data and triggered samples while maintaining high attack efficacy.17

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

MIM4DD: Mutual Information Maximization for Dataset Distillation

Neural Information Processing SystemsApr-25-2026, 23:44:23 GMT

A.1 In-variance of Mutual Information Theorem 1 (In-variance of Mutual Information): Mutual information is invariant under reparametrization of the marginal variables.

artificial intelligence, dataset, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

DC-BENCH: Dataset Condensation Benchmark

Neural Information Processing SystemsApr-24-2026, 09:16:34 GMT

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced1 to facilitate future research and application.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Neural Information Processing SystemsMar-22-2026, 18:37:07 GMT

The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar.

artificial intelligence, machine learning, proceedings, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Elucidating the Design Space of Dataset Condensation

Neural Information Processing SystemsMar-22-2026, 03:31:06 GMT

Dataset condensation, a concept within $\textit{data-centric learning}$, aims to efficiently transfer critical attributes from an original dataset to a synthetic version, meanwhile maintaining both diversity and realism of syntheses. This approach can significantly improve model training efficiency and is also adaptable for multiple application areas. Previous methods in dataset condensation have faced several challenges: some incur high computational costs which limit scalability to larger datasets ($\textit{e.g.,}$ MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets ($\textit{e.g.,}$ SRe$^2$L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive designing-centric framework that includes specific, effective strategies like implementing soft category-aware matching, adjusting the learning rate schedule and applying small batch-size. These strategies are grounded in both empirical evidence and theoretical backing.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Neural Information Processing SystemsFeb-18-2026, 12:43:27 GMT

data mining, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Africa > Togo (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Fetch and Forge: Efficient Dataset Condensation for Object Detection Ding Qi1 Jian Li

Neural Information Processing SystemsFeb-18-2026, 08:04:23 GMT

It is crucial for accelerating network training and reducing data storage requirements.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Elucidating the Design Space of Dataset Condensation

Neural Information Processing SystemsFeb-17-2026, 14:30:15 GMT

D ataset C ondensation ( EDC), establishes a benchmark for both small and large-scale dataset condensation.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

SupplementaryMaterialsfor" PrivateSetGeneration withDiscriminativeInformation "

Neural Information Processing SystemsFeb-9-2026, 07:55:13 GMT

To compute the privacy cost of our approach, we numerically computeDα(M(D) M(D)) in Definition A.1 for a range of ordersα [9, 14] in each training step that requires access to the real gradientgDθ . In comparison to normal non-private training, the major part of the additional memory and computation costisintroduced bytheDP-SGD [1]step(fortheper-sample gradient computation) that sanitizes the parameter gradient on real data, while the other steps (including the update onS, and theupdates ofF(;θ)onS areequivalent tomultiple calls ofthenormal non-privateforward and backward passes (whose costs havelower magnitude than theDP-SGD step). GS-WGAN [3] 5 We adopt the default configuration provided by the official implementation (ε=10): thesubsamplingrate =1/1000,DPnoisescaleσ =1.07,batchsize=32. Following[3], we pretrain (warm-start) the model for2K iterations, and subsequently train for 20K iterations. The experiments presented in Section 5.2 of the main paper correspond to the classincremental learning setting [10]where thedata partition ateach stage contains data from disjoint subsets of label classes.

artificial intelligence, machine learning, privatesetgeneration withdiscriminativeinformation, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Filters

Collaborating Authors

dataset condensation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

MIM4DD: Mutual Information Maximization for Dataset Distillation

DC-BENCH: Dataset Condensation Benchmark

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Elucidating the Design Space of Dataset Condensation

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Fetch and Forge: Efficient Dataset Condensation for Object Detection Ding Qi1 Jian Li

Elucidating the Design Space of Dataset Condensation

7bdd36a198a8408f444834039b09f518-Paper-Conference.pdf

SupplementaryMaterialsfor" PrivateSetGeneration withDiscriminativeInformation "